Task Partitioning 


introduction 

•Task partitioning strategy in parallel 
computing system is the key factor to 
decide the efficiency, speedup of the 
parallel computing systems. The process is 
partitioned into the subtasks where the size 
of the task is determined by the run time 
performance of the each server 


System Partitioning 

• The functionality of a system is implemented with a set 
of interconnected system components, such as ASIC’s 
memories, CPU’s, buses. 

• The designer must solve two problems: 

• select a set of system components (allocation), 

• partition the system’s functionality among these components 

(partitioning). 

The final implementation has to satisfy a set of design 
constraints, such as: 

• cost, 

• performance and 

• power consumption 


Structural Partitioning 


• First the system components are implemented using 
interconnected hardware components. 

• Partitioning separates the objects into groups, where each 
group represents a system component. 

• Mostly used at lower levels of abstraction for hardware 
partitioning. 

Satisfies certain constraints (for instance packaging). 

• Problems: 

size/performance tradeoffs are difficult, 

- large number of objects. 


Functional Partitioning 

• The system level functionality is partitioned in order to 
divide the behavior of the system between multiple 
components. 

• Usually executable model is partitioned and therefore 
the estimation of parameters and partitioning results is 
possible. 

• Advantages: 

- size/performance tradeoffs, 

- sm al I number of objects, 

- hardware/software solutions. 


Types of Parallelism : Two Extremes 

• Data parallel 

• Each processor performs the same task on different data 

• Example - grid problems 

• Task parallel 

• Each processor performs a different task 

• Example - signal processing 

• Most applications fall somewhere on the continuum 
between these two extremes 
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Basics of Data Parallel Programming 


• One code will run on 2 CPUs 

• Program has array of data to be operated on by 2 CPU so array is split 


into two parts. 

program : 

CPU A 

CPUS 

if CPU=a then 

program : 


program : 

low limit=l 

... 


... 

upper limit=50 

low limit=l 


low limit=51 

elseif CPU=b then 

upper limit=50 


upper limit=100 

low limit=51 

do 1= low limit. 


do 1= low limit. 

upper limit=100 

upper limit 


upper limit 

end if 

work on A(I) 


work on A(I) 

do I = low limit. 

end do 


end do 

upper limit 

... 


... 

work on A(I) 

end program 


end program 

end do 




• • • 

end program 
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Basics of Task Parallel Programming 

• One code will run on 2 CPUs 

• Program has 2 tasks (a and b) to be done by 2 CPUs 


CPU A CPB 


program . f : 

program . f : 


program . f : 

... 


initialize 

• • • 


• • • 

• • • 

if CPU=a then 

initialize 


initialize 

do task a 

• • • 


• • • 

elseif CPU=b then 

do task b 

do task a 


do task b 

end if 

• • • 


• • • 

end program 

end program 


end program 


Task partitioning strategy 



Task Partitioning 


signal S1 r S2, S3, S4 r 35, S6: INTEGER; 

j_ j_ -> 

PI: procsss 
begin 


variable A. B: INTEGER; 


A:=(S1+5) T 3; 

B:=S1+S2+7; 

S3<=A*B; 


end process; 
P~2: process 

begin 


variable X, Y: INTEGER; 


wait on S3; 

S4<=S3-*-X; 

* * * 

wait on S5; 
S6<=S5*Y: 


i 1 


end process; 
P3: process 
begin 


variable Z: INTEGER; 
wait on S4; 
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S5<=S4+Z; 



Metrics and Estimations 


• I Partitioning algorithms have to rely on a quantitative 
measure of a candidate solution’s goodness. 

• I Metrics — attributes which characterize a given 
solution; 

they are expressed quantitatively. 

• I Metrics include: 

- cost, 
execution time, 
communication rates, 

- power consumption, 
testability, 

- reliability, 
program size, 
data size 
and memory size. 


Metrics and Estimations 


Estimation determines a metric value from a 
rough implementation. 

Inaccuracy can be tolerated as long as the 
relative goodness of any two partitions is 
determined correctly. 


Limits of Parallel Computing 

• Theoretical Upper Limits 

• Amdahl’s Law 

• Practical Limits 

• Load balancing 

• Non-computational sections 

• Other Considerations 

time to re-write code 
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Theoretical Upper Limits to Performance 

• All parallel programs contain: 

• Serial sections 

• Parallel sections 

• Serial sections limit the parallel effectiveness 

• Speedup is the ratio of the time required to run a code on 
one processor to the time required to run the same code 
on multiple (N) processors 

Amdahl’s Law states this formally 
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Amdahl’s Law 

• Amdahl’s Law places a strict limit on the speedup that can be realized 
by using multiple processors. 

• Effect of multiple processors on run time 

t n =(f p / N+ fs) t x 

• Effect of multiple processors on speed up 

• Where S ~ 

fs = serial fraction of code J s f p ^ 
fp = parallel fraction of code 
N = number of processors 
t n = time to run on N processors 
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Illustration of Amdahl's Law 

It takes only a small fraction of serial content in a code to 
degrade the parallel performance. 



250 
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Practical Limits: Amdahl’s Law vs. Reality 

Amdahl’s Law provides a theoretical upper limit on parallel 
speedup assuming that there are no costs for speedup assuming 
that there are no costs for communications. In reality, 
communications will result in a further degradation of performance. 


80 

70 

60 

Speedup 50 

w 

% 40 
- 30 
20 
10 
0 



100 150 

Number of processors 


o 


50 


200 


250 





THANK YOU ALL !!! 





